SDA 4.1 Documentation for HARCIMPORT


NAME

HARCimport - Import HARC file specifications into SDA 4.1

USAGE

java -jar harcimport.jar path_of_options_file

(This program is not part of the regular Version 4.1 package, but a special distribution package is available.)

DESCRIPTION

HARCIMPORT is a batch mode alternative to using the interactive SDA Manager. This procedure (named 'harcimport.jar') is a Java program that reads a HARC file and imports its contents into a Version 4 SDA database. Before you can use HARCIMPORT for the first time, the system administrator must do some editing of the 'harcimport.jar' file during installation. This procedure is described in the installation instructions below.

Before running the HARCIMPORT program, you must prepare a batch options file. The current document describes the contents of such an options file. The full pathname of this file is then passed to the Java program at the end of the JAVA command line. An example of the Java command line is given below.

Note that the configuration specifications for Version 4 of SDA are somewhat different from Version 3 of SDA. Not all of the Version 3 HARC file specifications are relevant for setting up an SDA dataset in Version 4. HARC file specifications that are no longer applicable to Version 4 are simply ignored by HARCIMPORT.

There is also one incompatibility between the HARC files of Version 3 and the equivalent HARC files of Version 4. This difference has to do with specifying the location of a file available for downloading, using the 'DLFILE=' keyword. In Version 3, the location of such a file was given as a URL. In Version 4, however, the location must be given as an absolute pathname. Therefore, before using HARCIMPORT with a Version 3 HARC file, you need to search for 'DLFILE=' keywords in the HARC file and change the specified URLs to absolute pathname locations.

See the HARC file documentation for information on the applicable specifications and for a cross- reference between the fields used by the Version 4 SDA Manager and the HARC file specifications.


CONTENTS OF THIS DOCUMENT


KEYWORDS

The options file contains specifications for the import. These specifications are given in the form "keyword = something" with one keyword per line. The keywords may be given in any order,

These keywords must be given in lower case, and all of the characters are significant. If a specification requires multiple lines, put a backslash (\) at the end of each line that will be continued to another line. (See some examples below.)


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

harc=         full pathname of the HARC file  REQUIRED

globalopts=   ID of global options            REQUIRED
               (see below)

datasets=     IDs of the datasets in          Import ALL datasets
               the HARC file to import
               and/or REPLACE (see below)

groups=       permission groups to which      No permissions
               the imported datasets will       granted
               be added (see below)


GLOBAL OPTIONS

The first part of a HARC file defines global options, which apply to each of the datasets defined in that HARC file. When you import a HARC file, you can decide EITHER to use the global options defined in the HARC file OR to use some other previously defined set of global options. It is important to understand that HARCIMPORT will not modify a pre-existing set of global options. You can use use a pre-existing set of global options, or save the global options in the HARC file as a new set, but you cannot modify a pre-existing set, because it could have unforeseen consequences for previously defined datasets. If you really want to modify a pre-existing set of global options, you can do that with the interactive SDA Manager.

DATASETS

HARCIMPORT will add ALL of the datasets in the HARC file to the SDA Version 4 database, UNLESS you name specific datasets after the 'datasets=' keyword.

If one of the datasets is already in the database, it will first be deleted and then it will be replaced using the specifications given in the HARC file.


GROUPS

SDA controls access to various datasets by granting access permission only to the members of previously defined -- and named -- groups of users. If you want the datasets in this HARC file to be accessible to members of specific groups, give the name/ID of the group(s) after the 'groups=' keyword.

If you want the datasets to be available to everyone who has access to your data archive, specify the 'anonymous' group like this: groups=anonymous.

If no groups are specified, the datasets imported will not be accessible to anyone, until you later use the SDA Manager to grant access to specific groups.


EXAMPLES OF OPTIONS FILES

1. Import ALL datasets in the HARC file and make them accessible to ALL users


     harc = /sa/sdatest/harcsda
     globalopts = standard_options

     groups = anonymous

2. Import only SOME of the datasets, and restrict them to members of the 'research1' and 'research2' groups.

(Notice the backslash in the datasets specification to continue the list of IDs onto a second line.)
     harc = /sa/sdatest/harcsda
     globalopts = standard_options

     datasets = study1 study2 study3 \
        study7 study9 study11
     groups = research1 research2

3. Import all of the datasets, and create a NEW global options specification.

     harc = /sa/sdatest/harcsda
     groups = research1 research2

     globalopts = new_globaloptions


INSTALLATION INSTRUCTIONS

Before using HARCIMPORT for the first time, the contents of the distribution package must be unpacked and some minor editing of the 'harcimport.jar' file must be done. Here are the steps to be taken:

Uncompressing/Installing the package (on Linux):

The 'harcimport.tgz' file is a tar'd/gzip'd file. Copy it to the directory where you want to use it and then extract the contents of the file with:
% tar xzvf harcimport.tgz
You should now see a 'harcimport.jar' file and a 'lib' directory (containing several other auxiliary jar files). You should also see a 'harcimport.properties' file that contains database configuration information.

The 'harcimport.properties' file needs to be edited to conform to your MySQL installation's settings. Then it needs to be copied into the harcimport.jar file using the Java JDK 'jar' utility. These steps are explained next.

Editing the 'harcimport.properties' file

The specifications in the unedited 'harcimport.properties' file are:
db.driverClassName=com.mysql.jdbc.Driver db.url=jdbc:mysql://localhost:3306/sdaschema db.username=myusername db.password=mypassword

The first two specifications ("db.driverClassName" and "db.url") will probably not need to be edited. The name "com.mysql.jdbc.Driver" is a hard-wired name and should not be altered. The "db.url" specification would only need to be changed in some rare instances -- where you run MySQL on a non- standard port for example. However, the "myusername" and "mypassword" entries will need to be replaced with a username and password that are valid for your MySQL installation.

Updating the 'harcimport.jar' file

Once the 'harcimport.properties' file has been edited, you can copy it into the the 'harcimport.jar' file with the following command:
jar -uvf harcimport.jar harcimport.properties
The 'jar' utility program is a standard part of the Java JDK and is in the JDK's 'bin' directory (along with 'java', etc.).

Once these steps have been completed, the HARCIMPORT program is ready to be used, as shown in the usage section above. An example is given in the next paragraph.


EXAMPLE OF THE JAVA COMMAND LINE

An example of a HARCIMPORT Java command is:
java -jar harcimport.jar /var/sdatest/harcimportbatch.txt
This example assumes that you have written the desired batch import options into the file 'harcimportbatch.txt' in the directory '/var/sdatest'.

SEE ALSO

HARC HTML archive specification file
sdamanager SDA Manager


CSM, UC Berkeley/ISA
October 2, 2019